crz1hog1 condition-by-time interaction.genes.double.voom.ET.vs.FK506.csv.Condition-by-time interaction.genes.double.voom.crz1.ET.vs.crz1hog1.ET.csv,Condition-by-time interaction.genes.sva.double.voom.crz1.ET.vs.crz1hog1.ET.csv,Alignment of crz1hog1.ET and crz1hog1.FK506 RNA-seq data to S. cerevisae (Scer3) and generate fpkm table to all genes, including gene names along with ORF IDs in the file.
For new experiment samples (crz1hog1.FK506 vs crz1hog1.ET), Calculate fold change values of each time point to the corresonding 0 min sample, and determine whihc changes are significantly different from 0
Determine whether the fold change at each time point relative to time 0 is significantly different between crz1hog1.ET samples and crz1hog1.FK506 experiments, for all genes (This is to test interaction between time and strains)
Extract sets of genes that we are interested in and calcuate average expression at each time point, generate heatmaps
Determine whether the fold change at each time point relative to time 0 is significantly different between crz1hog1.ET samples and previous crz1.ET experiments, for all genes (This is to test interaction between time and strains) CAUTION: Because the two sets of samples were processed in totally different batches, btach effects might confound with biological effects.
In experiments of crz1hog1.FK506, CaCl2 is added to a culture and samples are taken at 3 time points – 10, 30, 60 minutes, with time 0 as baseline
| experiments | sample name | description |
|---|---|---|
| pre_1 and pre_4 | crz1.ET | Previous control samples |
| A and C | crz1hog1.ET | New control samples |
| B and D | crz1hog1.FK506 | New samples treated with FK506 inhibitor |
| file | sample | time | group | batch |
|---|---|---|---|---|
| 1-0 | 1 | 0 | crz1.ET | 3 |
| 1-10 | 1 | 10 | crz1.ET | 3 |
| 1-30 | 1 | 30 | crz1.ET | 3 |
| 1-60 | 1 | 60 | crz1.ET | 3 |
| 4-0 | 4 | 0 | crz1.ET | 4 |
| 4-10 | 4 | 10 | crz1.ET | 4 |
| 4-30 | 4 | 30 | crz1.ET | 4 |
| 4-60 | 4 | 60 | crz1.ET | 4 |
| A-0 | A | 0 | crz1hog1.ET | 1 |
| A-10 | A | 10 | crz1hog1.ET | 1 |
| A-30 | A | 30 | ccrz1hog1.ET | 1 |
| A-60 | A | 60 | crz1hog1.ET | 1 |
| B-0 | B | 0 | crz1hog1.FK506 | 1 |
| B-10 | B | 10 | crz1hog1.FK506 | 1 |
| B-30 | B | 30 | crz1hog1.FK506 | 1 |
| B-60 | B | 60 | crz1hog1.FK506 | 1 |
| C-0 | C | 0 | crz1hog1.ET | 2 |
| C-10 | C | 10 | crz1hog1.ET | 2 |
| C-30 | C | 30 | ccrz1hog1.ET | 2 |
| C-60 | C | 60 | crz1hog1.ET | 2 |
| D-0 | D | 0 | crz1hog1.FK506 | 2 |
| D-10 | D | 10 | crz1hog1.FK506 | 2 |
| D-30 | D | 30 | crz1hog1.FK506 | 2 |
| D-60 | D | 60 | crz1hog1.FK506 | 2 |
All files could be downloaded from here.
Here we call voom and duplicateCorrelation twice each to treat sample as a random effect. This can be done in limma using the duplicateCorrelation function. voom is used to prepare the inputs for lmFit. Function DESeq2::DESeqDataSetFromHTSeqCount is used to import count matrix from HTseq-count results.
#### read crz1hog1.ET and crz1hog1.FK506 count table created by Haibo
countTable <- read.delim("crz1hog1.ET+crz1hog1.FK506.count.table.txt", as.is =T, row.name=1)
#### read metadata
sampleTable <- read.delim("AUG27.metadata.txt", as.is=T)
sampleTable$lev <- paste(sampleTable$Group, sampleTable$Time, sep ="_")
sampleTable$lev <- as.factor(sampleTable$lev)
sampleTable$Batch <- as.factor(sampleTable$Batch)
head(sampleTable)
## SampleID Sample Time Group Batch lev
## 1 A_0 A 0 crz1hog1.ET 1 crz1hog1.ET_0
## 2 A_10 A 10 crz1hog1.ET 1 crz1hog1.ET_10
## 3 A_30 A 30 crz1hog1.ET 1 crz1hog1.ET_30
## 4 A_60 A 60 crz1hog1.ET 1 crz1hog1.ET_60
## 5 B_0 B 0 crz1hog1.FK506 1 crz1hog1.FK506_0
## 6 B_10 B 10 crz1hog1.FK506 1 crz1hog1.FK506_10
We have countTable and sampleTable now. We will call voom and duplicateCorrelation twice.
Can you explain to me what the different columns are in the tables?
A: In the table gene.diff.double.voom.csv, we compared samples to corresponding time 0 for each time point. For each comparison, there are 7 columns. The column names will be, for example, for crz1.ET sample time 10 point, crz1hog1.ET_10.FoldChange, crz1hog1.ET_10.logFC, crz1hog1.ET_10.AveExpr, crz1hog1.ET_10.t, crz1hog1.ET_10.P.Value, crz1hog1.ET_10.adj.P.Val, crz1hog1.ET_10.B. crz1hog1.ET_10 is the prefix, which indicates the sample and time. And the suffix meaning is showning following table
| suffix | description |
|---|---|
| FoldChange | fold change vs. time 0 |
| logFC | log2 fold change vs. time 0 |
| AveExpr | average log2-expression for the gene over all samples in comparison |
| t | moderated t-statistic |
| P.Value | raw p-value |
| adj.P.Val | adjusted p-value |
| B | log-odds that the gene is differentially expressed |
## make model matrix and contrast matrix
#### amke sure the rows of the metadata and the colnums of the couttable match
## all(sampleTable$SampleID == colnames(countTable))
design <- model.matrix(~0 + lev + Batch, data=sampleTable)
colnames(design) <- gsub("^lev", "", colnames(design))
#### filtering genes with too low expression (Jianhong didn't filtering the counttable, I also skip it)
## keep <- rowSums(cpm(countTable)>1) >= 2
## countTable <- countTable[keep, ] ## 6164 genes kept
y <- DGEList(countTable)
y <- calcNormFactors(y)
v <- voom(y, design)
corfit <- duplicateCorrelation(v, design, block=sampleTable$Sample)
#corfit$consensus
v <- voom(y, design, block=sampleTable$Sample, correlation=corfit$consensus)
corfit2 <- duplicateCorrelation(v, design, block=sampleTable$Sample)
#corfit2$consensus
fit <- lmFit(v, design, block=sampleTable$Sample, correlation=corfit2$consensus)
## make contrasts
contrasts <- colnames(design)[-(ncol(design))]
keep <- grepl("\\_0$", contrasts)
A <- contrasts[!keep]
B <- contrasts[keep]
B <- rep(B, each=3)
contrasts <- paste(A, B, sep="-")
names(contrasts) <- A
contrasts
## crz1hog1.ET_10 crz1hog1.ET_30
## "crz1hog1.ET_10-crz1hog1.ET_0" "crz1hog1.ET_30-crz1hog1.ET_0"
## crz1hog1.ET_60 crz1hog1.FK506_10
## "crz1hog1.ET_60-crz1hog1.ET_0" "crz1hog1.FK506_10-crz1hog1.FK506_0"
## crz1hog1.FK506_30 crz1hog1.FK506_60
## "crz1hog1.FK506_30-crz1hog1.FK506_0" "crz1hog1.FK506_60-crz1hog1.FK506_0"
cm <- makeContrasts(contrasts=contrasts, levels=design)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2)
results <- sapply(contrasts, topTable,
fit=fit2, number=nrow(fit2), sort.by="none",
simplify=FALSE)
results2 <- mapply(function(.ele, .n){
cbind(FoldChange=2^.ele$logFC, .ele)
}, results, names(results), SIMPLIFY=FALSE)
rown <- sapply(results2, rownames)
#dim(rown)
rown <- apply(rown, 1, unique)
#length(rown) ## all rowname are identical.
results2 <- do.call(cbind, results2)
#dim(results2)
# Yeast gene symbols and gene ID
geneSymbol <- read.delim("Yeast.gene.symbols.txt", header =TRUE, as.is =TRUE)
results2$GeneID <- rownames(results2)
results2 <- merge(geneSymbol, results2, by.x= "Gene.stable.ID", by.y = "GeneID", all.y=TRUE)
write.csv(results2, "crz1hog1.Fk506 and crz1hog1.ET within-condition.differential.expression.genes.double.voom.csv")
crz1hog1 condition-by-time interaction.genes.double.voom.ET.vs.FK506.csv.Can you explain to me what the different columns are in the tables?
A: This test of interaction between condition and time. For example, in table crz1hog1 condition-by-time interaction.genes.double.voom.ET.vs.FK506.csv, we compared crz1hog1.ET with crz1hog1.FK506 for corresponding time points which is normalized by time 0. The expression is, for example time 10, (crz1.ET_10 - crz1.ET_0) - (crz1.FK506_10 - crz1.FK506_0). The meaning of suffix is same as above.
## load data
f <- "JB_target_lists.xlsx"
sheets <- getSheets(loadWorkbook(f))
## can not use xlsx::read.xlsx, because error genereated for empty cells
geneset <- lapply(names(sheets)[-1],read.xls, xls = f, stringsAsFactors=FALSE)
names(geneset) <- names(sheets)[-1]
## get logFC for each time point and condition relative to time 0
fc <- lapply(results, function(.ele) .ele$logFC)
fc <- do.call(cbind, fc)
rownames(fc) <- rownames(results[[1]])
foldchange <- lapply(geneset, function(.ele) {
## if filtering the count table this will cause an error
this.fc <- fc[.ele[, "YORF"], ]
rownames(this.fc) <- ifelse(is.na(.ele[, "NAME"]),
.ele[, "YORF"],
.ele[, "NAME"])
this.fc
})
For p value calculation, please refer1.
Can you give me data files corresponding the gene lists so that I can use to re-generate the plots labeled “change curves” for different gene sets? What exactly are the values that are plotted here? Are these data from the table gene.diff.double.voom.csv?
A: The data for plot are saved in the folder crz1hog1.ET.and.crzhog1.FK506.changeCurve. The points are mean value of expression of the gene cluster. The error bar is the the 95% confidence interval. Gene expressions are from the table gene.diff.double.voom.csv.
changeCurve <- "crz1hog1.ET.and.crzhog1.FK506.changeCurve"
dir.create(changeCurve)
null <- mapply(function(log2fc, genesetName){
## calculate p value
rn <- nrow(log2fc)
cn <- ncol(log2fc)/2
lfc <- list(
"ET"=log2fc[, 1:cn],
"FK506"=log2fc[, cn+(1:cn)]
)
cn <- combn(2, 2, simplify = FALSE)
pval <- sapply(cn, function(.ele){
data <- do.call(rbind, lfc[.ele])
dataname <- names(lfc)[.ele]
group.labels <- rep(dataname, each=rn)
# define Mahalanobis distance matrix by ecodist::distance:
dmat <- as.matrix(distance(data,"mahalanobis"))
out <- t(DBF.test(dmat,group.labels,nrow(data)))
rownames(out) <- paste(dataname, collapse=".vs.")
out
}, simplify = FALSE)
pval <- do.call(rbind, pval)
cat(c("geneset:", genesetName, "\r\n"))
print(pval)
fc.colmean <- colMeans(log2fc)
fc.sd <- apply(log2fc, 2, sd)
fc.se <- fc.sd/sqrt(nrow(log2fc))
fc.mf <- do.call(rbind, strsplit(names(fc.colmean), "[._]"))
fc.mf <- data.frame(sample=fc.mf[, 2],
time=as.numeric(fc.mf[, 3]),
value=fc.colmean,
sd=fc.sd,
ci95=fc.se*2)
time0 <- data.frame(sample=levels(fc.mf$sample), time=0, value=0, sd=0, ci95=0)
fc.mf <- rbind(fc.mf, time0)
fc.mf <- fc.mf[with(fc.mf, order(sample, time)), ]
null <- write.csv(fc.mf, file.path(changeCurve, paste0(make.names(genesetName), ".csv")))
p <- ggplot(fc.mf, aes(x=time, y=value, group=sample)) +
geom_line(aes(linetype=sample, color=sample)) +
geom_point(aes(color=sample), size=5) +
geom_errorbar(aes(ymin=value-ci95, ymax=value+ci95, color=sample), width=1) +
scale_color_brewer(palette="Dark2") +
theme_bw() + labs(title=genesetName) + ylab("log2 fold change")
print(p)
ggsave(file.path(changeCurve, paste0(make.names(genesetName), ".pdf")),
width=8, height=6)
}, foldchange, names(foldchange))
## geneset: all SBFMBF targets
## dbf.statistic dbf.p.value
## ET.vs.FK506 -0.005028325 0.8691003
## geneset: SBF & MBF
## dbf.statistic dbf.p.value
## ET.vs.FK506 0.03079396 0.2631338
## geneset: MBF only
## dbf.statistic dbf.p.value
## ET.vs.FK506 0.07723127 0.004924197
## geneset: SBF only
## dbf.statistic dbf.p.value
## ET.vs.FK506 -0.04196558 0.9999972
## geneset: Hcm1 targets
## dbf.statistic dbf.p.value
## ET.vs.FK506 0.0001098942 0.5970638
## geneset: Clb2 cluster
## dbf.statistic dbf.p.value
## ET.vs.FK506 0.2738946 2.243427e-06
## geneset: Mcm cluster
## dbf.statistic dbf.p.value
## ET.vs.FK506 0.00716932 0.5479305
## geneset: all Ace2Swi5 targets
## dbf.statistic dbf.p.value
## ET.vs.FK506 -0.02149212 1
## geneset: Ace2 & Swi5
## dbf.statistic dbf.p.value
## ET.vs.FK506 -0.03399481 0.9732827
## geneset: Ace2 only
## dbf.statistic dbf.p.value
## ET.vs.FK506 -0.0205176 0.9963217
## geneset: Swi5 only
## dbf.statistic dbf.p.value
## ET.vs.FK506 -0.02021334 0.934495
## geneset: Hog1Msn targets
## dbf.statistic dbf.p.value
## ET.vs.FK506 0.2460604 0
Heatmaps are rescaled for each genes.
What data is depicted in the heat maps? I don’t understand where the yellow (increases) are coming from, since many of those genes decrease and don’t increase over time.
A: The source data for heatmap are logFC data compare to time 0. They are saved in the folder heatmap. To emphasize the change of each gene, heatmaps are rescaled by row (each gene). The un-scaled figure can be find in next section.
library(pheatmap)
heatmap <- "crz1hog1.ET and crz1hog1.Fk506.heatmap"
dir.create(heatmap)
null <- mapply(function(log2fc, genesetName){
null <- write.csv(log2fc, file.path(heatmap, paste0(make.names(genesetName), ".csv")))
pheatmap(log2fc, color=colorRampPalette(c("#22B8E2", "#000000", "#FEFB30"))(11),
cluster_cols = FALSE, treeheight_row=0, border_color = NA, scale="row")
pheatmap(log2fc, color=colorRampPalette(c("#22B8E2", "#000000", "#FEFB30"))(11),
cluster_cols = FALSE, treeheight_row=0, border_color = NA, scale="row",
filename = file.path(heatmap, paste0(make.names(genesetName), ".pdf")),
width=4, height=nrow(log2fc)/10+2)
}, foldchange, names(foldchange))
null <- mapply(function(log2fc, genesetName){
pheatmap(log2fc, color=colorRampPalette(c("#22B8E2", "#000000", "#FEFB30"))(11),
cluster_cols = FALSE, treeheight_row=0, border_color = NA, scale="none")
pheatmap(log2fc, color=colorRampPalette(c("#22B8E2", "#000000", "#FEFB30"))(11),
cluster_cols = FALSE, treeheight_row=0, border_color = NA, scale="none",
filename = file.path(heatmap, paste0(make.names(genesetName), ".unscaled.pdf")),
width=4, height=nrow(log2fc)/10+2)
}, foldchange, names(foldchange))
directory <- "htseq" ## htseq-count outputs
sampleTable <- read.delim("design.txt") ## design table
## assign fileName to design table
sampleTable$fileName <- paste0("tophat2.sc3.", sampleTable$file, ".count.tab")
colnames(sampleTable)[1] <- "sampleName"
sampleTable <- sampleTable[, c(1, 6, 2:5)]
#head(sampleTable)
## we only want count table, so design is not useful at this step.
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable, directory, design=~group)
##ddsHTSeq
countTable <- counts(ddsHTSeq)
#dim(countTable)
#head(countTable)
##export crz1.ET countTable (Haibo)
crz1 <- subset(sampleTable, group == "crz1.ET" & time %in% c(0, 10,30,60))
count.table.crz1 <- countTable[,crz1$sampleName]
colnames(count.table.crz1) <- gsub("-", "_", paste("Pre",colnames(count.table.crz1), sep="_"))
crz1$sampleName <- gsub("-", "_", paste("Pre",crz1$sampleName, sep="_"))
write.table(crz1, "Old.crz1.ET.metadata.txt", row.names=F, quote = F, sep="\t")
write.table(count.table.crz1, "Old.crz1.ET.count.table.txt", row.names=T, quote = F, sep="\t")
####
meta.crz1 <- read.delim("Old.crz1.ET.metadata.txt", as.is =T)
## metadata for AUG 27 2017 samples
meta <- read.delim("AUG27.metadata.txt", as.is=T)[1:16,]
meta.crz1 <- meta.crz1[, c(1,3:6)]
colnames(meta.crz1) <- colnames(meta)
meta.crz1$Batch <- ifelse(meta.crz1$Batch==1, 3, meta.crz1$Batch)
meta.crz1$Batch <- ifelse(meta.crz1$Batch==2, 4, meta.crz1$Batch)
meta.crz1.crz1hog1 <- rbind(meta.crz1, subset(meta, Group=="crz1hog1.ET"))
meta.crz1.crz1hog1$lev <- as.factor(with(meta.crz1.crz1hog1, paste(Group, Time, sep="_")))
final.meta <- meta.crz1.crz1hog1
#head(final.meta)
###countTable for crz1hog1.ET
count.crz1hog1 <-read.delim("crz1hog1.ET+crz1hog1.FK506.count.table.txt", as.is =T)
##### merge
count.table.crz1 <- as.data.frame(count.table.crz1)
count.table.crz1$GeneID <- rownames(count.table.crz1)
final.count <- merge(count.table.crz1,count.crz1hog1, by="GeneID" )
rownames(final.count) <- final.count[, 1]
final.count <- as.matrix(final.count[,-1])
## only ET samples
final.count <- final.count[, grepl("^Pre|^A|^C", colnames(final.count))]
sampleTable <- final.meta
countTable <- final.count
head(countTable)
## Pre_1_0 Pre_1_10 Pre_1_30 Pre_1_60 Pre_4_0 Pre_4_10 Pre_4_30
## Q0010 0 0 0 0 0 0 0
## Q0017 0 0 0 0 0 0 0
## Q0032 0 0 0 1 2 1 0
## Q0045 2379 2634 2722 2424 2055 2375 2100
## Q0050 3432 4297 4282 3798 4141 4180 3886
## Q0055 4971 6081 5907 5687 5156 5313 5071
## Pre_4_60 A_0 A_10 A_30 A_60 C_0 C_10 C_30 C_60
## Q0010 1 1 3 8 7 5 2 5 1
## Q0017 0 0 0 0 0 0 0 0 0
## Q0032 0 3 0 0 1 0 2 0 0
## Q0045 2211 4689 4061 5540 6760 4569 4462 6728 7141
## Q0050 4070 14962 13385 17844 21886 17917 18463 24903 22843
## Q0055 5075 12379 11656 15447 20269 13678 14155 18918 19248
head(sampleTable)
## SampleID Sample Time Group Batch lev
## 1 Pre_1_0 1 0 crz1.ET 3 crz1.ET_0
## 2 Pre_1_10 1 10 crz1.ET 3 crz1.ET_10
## 3 Pre_1_30 1 30 crz1.ET 3 crz1.ET_30
## 4 Pre_1_60 1 60 crz1.ET 3 crz1.ET_60
## 5 Pre_4_0 4 0 crz1.ET 4 crz1.ET_0
## 6 Pre_4_10 4 10 crz1.ET 4 crz1.ET_10
#all(sampleTable$SampleID == colnames(countTable))
design <- model.matrix(~0 + lev, data=sampleTable)
colnames(design) <- gsub("^lev", "", colnames(design))
y <- DGEList(countTable)
y <- calcNormFactors(y)
v <- voom(y, design)
corfit <- duplicateCorrelation(v, design, block=sampleTable$sample)
#corfit$consensus
v <- voom(y, design, block=sampleTable$sample, correlation=corfit$consensus)
corfit2 <- duplicateCorrelation(v, design, block=sampleTable$sample)
#corfit2$consensus
fit <- lmFit(v, design, block=sampleTable$sample, correlation=corfit2$consensus)
## make contrasts
contrasts <- colnames(design)[-(ncol(design))]
keep <- grepl("\\_0$", contrasts)
A <- contrasts[!keep]
B <- contrasts[keep]
B <- rep(B, each=3)
contrasts <- paste(A, B, sep="-")
names(contrasts) <- A
#contrasts
cm <- makeContrasts(contrasts=contrasts, levels=design)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2)
results <- sapply(contrasts, topTable,
fit=fit2, number=nrow(fit2), sort.by="none",
simplify=FALSE)
results2 <- mapply(function(.ele, .n){
cbind(FoldChange=2^.ele$logFC, .ele)
}, results, names(results), SIMPLIFY=FALSE)
rown <- sapply(results2, rownames)
#dim(rown)
rown <- apply(rown, 1, unique)
#length(rown) ## all rowname are identical.
results2 <- do.call(cbind, results2)
#dim(results2)
results2$GeneID <- rownames(results2)
results2 <- merge(geneSymbol, results2, by.x= "Gene.stable.ID", by.y = "GeneID", all.y=TRUE)
write.csv(results2, "crz1hog1.ET.vs.crz1.ET.within-condition.differential.expression.genes.double.voom.csv")
Condition-by-time interaction.genes.double.voom.crz1.ET.vs.crz1hog1.ET.csv,contrasts2 <- matrix(contrasts, nrow=3, ncol=2 )
cn <- combn(2,2, simplify = FALSE)
## interaction contrasts
contrasts3 <- lapply(cn, function(.ele){
paste0("(", contrasts2[, .ele[1]], ")-(", contrasts2[, .ele[2]], ")")
})
names(contrasts3) <- "crz1.ET.vs.crz1hog1.ET"
contrasts3
## $crz1.ET.vs.crz1hog1.ET
## [1] "(crz1.ET_10-crz1.ET_0)-(crz1hog1.ET_10-crz1hog1.ET_0)"
## [2] "(crz1.ET_30-crz1.ET_0)-(crz1hog1.ET_30-crz1hog1.ET_0)"
## [3] "(crz1.ET_60-crz1.ET_0)-(crz1.ET_10-crz1hog1.ET_0)"
null <- mapply(function(contr, name){
cm2 <- makeContrasts(contrasts=contr, levels=design)
fit3 <- contrasts.fit(fit, cm2)
fit3 <- eBayes(fit3)
results3 <- sapply(contr, topTable,
fit=fit3, number=nrow(fit3), sort.by="none",
simplify=FALSE)
results4 <- mapply(function(.ele, .n){
cbind(FoldChange=2^.ele$logFC, .ele)
}, results3, names(results3), SIMPLIFY=FALSE)
rown <- sapply(results4, rownames)
dim(rown)
rown <- apply(rown, 1, unique)
length(rown) ## all rowname are identical.
results4 <- do.call(cbind, results4)
dim(results4)
results4$GeneID <- rownames(results4)
results4 <- merge(geneSymbol, results4, by.x= "Gene.stable.ID", by.y = "GeneID", all.y=TRUE)
write.csv(results4, paste0("Condition-by-time interaction.genes.double.voom.", name, ".csv"))
}, contrasts3, names(contrasts3))
#### to use sva, the count table must be filtered to remove genes with extremetly low expression
keep <- rowSums(cpm(countTable)>1) >= 2
countTable.filtered <- countTable[keep, ]
#dim(countTable.filtered)
cds <- DGEList(countTable.filtered)
#### TMM normalization
cds <- calcNormFactors(cds)
### Get normalized count table
scale <- cds$samples$lib.size*cds$samples$norm.factors
normCount <- round(t(t(countTable.filtered )/scale)*mean(scale))
get.sva <- function (expr.data=NULL, meta.data=NULL)
{
mod <- model.matrix(~0 + lev, data=meta.data)
mod0 <- model.matrix(~1, data=meta.data)
num.sva <-svaseq(expr.data, mod, mod0)$n.sv
sv <- svaseq(expr.data, mod, mod0, n.sv=num.sva)$sv
colnames(sv)<- paste0("sv",1:num.sva)
meta.data.sva <-cbind(meta.data, sv )
meta.data.sva
}
final.meta.sva <- get.sva(expr.data = normCount, meta.data = sampleTable)
## Number of significant surrogate variables is: 4
## Iteration (out of 5 ):1 2 3 4 5 Number of significant surrogate variables is: 4
## Iteration (out of 5 ):1 2 3 4 5
## 4 surrogate variables estimated
design <- model.matrix(~ 0 +lev + sv1 + sv2 +sv3 +sv4, data=final.meta.sva)
colnames(design) <- gsub("^lev", "", colnames(design))
sampleTable <- final.meta.sva
y <- DGEList(countTable.filtered)
y <- calcNormFactors(y)
v <- voom(y, design)
corfit <- duplicateCorrelation(v, design, block=sampleTable$Sample)
#corfit$consensus
v <- voom(y, design, block=sampleTable$Sample, correlation=corfit$consensus)
corfit2 <- duplicateCorrelation(v, design, block=sampleTable$Sample)
#corfit2$consensus
fit <- lmFit(v, design, block=sampleTable$Sample, correlation=corfit2$consensus)
## make contrasts
contrasts <- colnames(design)[1:8]
keep <- grepl("\\_0$", contrasts)
A <- contrasts[!keep]
B <- contrasts[keep]
B <- rep(B, each=3)
contrasts <- paste(A, B, sep="-")
names(contrasts) <- A
#contrasts
cm <- makeContrasts(contrasts=contrasts, levels=design)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2)
results <- sapply(contrasts, topTable,
fit=fit2, number=nrow(fit2), sort.by="none",
simplify=FALSE)
results2 <- mapply(function(.ele, .n){
cbind(FoldChange=2^.ele$logFC, .ele)
}, results, names(results), SIMPLIFY=FALSE)
rown <- sapply(results2, rownames)
#dim(rown)
rown <- apply(rown, 1, unique)
#length(rown) ## all rowname are identical.
results2 <- do.call(cbind, results2)
#dim(results2)
results2$GeneID <- rownames(results2)
results2 <- merge(geneSymbol, results2, by.x= "Gene.stable.ID", by.y = "GeneID", all.y=TRUE)
write.csv(results2, "crz.ET.and.crz1hog1.ET.within-condition.differential.expression.genes.sva.double.voom.csv")
Condition-by-time interaction.genes.sva.double.voom.crz1.ET.vs.crz1hog1.ET.csv,contrasts2 <- matrix(contrasts, nrow=3, ncol=2 )
cn <- combn(2,2, simplify = FALSE)
## interaction contrasts
contrasts3 <- lapply(cn, function(.ele){
paste0("(", contrasts2[, .ele[1]], ")-(", contrasts2[, .ele[2]], ")")
})
names(contrasts3) <- "crz1.ET.vs.crz1hog1.ET"
#contrasts3
null <- mapply(function(contr, name){
cm2 <- makeContrasts(contrasts=contr, levels=design)
fit3 <- contrasts.fit(fit, cm2)
fit3 <- eBayes(fit3)
results3 <- sapply(contr, topTable,
fit=fit3, number=nrow(fit3), sort.by="none",
simplify=FALSE)
results4 <- mapply(function(.ele, .n){
cbind(FoldChange=2^.ele$logFC, .ele)
}, results3, names(results3), SIMPLIFY=FALSE)
rown <- sapply(results4, rownames)
dim(rown)
rown <- apply(rown, 1, unique)
length(rown) ## all rowname are identical.
results4 <- do.call(cbind, results4)
dim(results4)
results4$GeneID <- rownames(results4)
results4 <- merge(geneSymbol, results4, by.x= "Gene.stable.ID", by.y = "GeneID", all.y=TRUE)
write.csv(results4, paste0("Condition-by-time interaction.genes.sva.double.voom.", name, ".csv"))
}, contrasts3, names(contrasts3))
1. Minas, C., Waddell, S. J. & Montana, G. Distance-based differential analysis of gene curves. Bioinformatics 27, 3135 (2011).